A novel imbalanced data classification approach using both under and over sampling

نویسندگان

چکیده

The performance of the data classification has encountered a problem when distribution is imbalanced. This fact results in classifiers tend to majority class which most instances. One popular approaches balance dataset using over and under sampling methods. paper presents novel pre-processing technique that performs both algorithms for an imbalanced dataset. proposed method uses SMOTE algorithm increase minority class. Moreover, cluster-based approach performed decrease takes into consideration new size experimental on 10 datasets show suggested better comparison previous approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Borderline over-sampling for imbalanced data classification

Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline b...

متن کامل

Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques

of the Thesis Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques

متن کامل

A Novel Approach to Handle Imbalanced Data for Classification

This paper attempts to propose a particle swarm K-means optimization (PSKO)-based granular computing (GrC) model to preprocess the skewed class distribution in order to enhance the classification accuracy for class imbalance problem. The GrC model acquires knowledge from information granules rather than from numerical data. It also processes multi-dimensional and sparse data by using singular v...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

The Imbalanced Training Sample Problem: Under or over Sampling?

The problem of imbalanced training sets in supervised pattern recognition methods is receiving growing attention. Imbalanced training sample means that one class is represented by a large number of examples while the other is represented by only a few. It has been observed that this situation, which arises in several practical domains, may produce an important deterioration of the classificatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bulletin of Electrical Engineering and Informatics

سال: 2021

ISSN: ['2302-9285']

DOI: https://doi.org/10.11591/eei.v10i5.2785